Bangor University - epSpread

VAST 2011 Challenge

Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Llyr ap Cenydd, Bangor University, llyr.ap.cenydd@bangor.ac.uk

Rick Walker, Bangor University, rick.walker@bangor.ac.uk [PRIMARY contact]

Serban Pop, Bangor University, serban@bangor.ac.uk

Helen Miles, Bangor University, eeu621@bangor.ac.uk

Chris Hughes, Bangor University, ees603@bangor.ac.uk

William Teahan, Bangor University, w.j.teahan@bangor.ac.uk

Jonathan C. Roberts, Bangor University, j.c.roberts@bangor.ac.uk

Tool(s):

We developed the epSpread tool in Processing at Bangor from May 2010 to provide both a multiple-view visualization of the data and a storyboarding timeline for analysis. Using a component-based architecture, streamgraphs, word clouds and map visualisations are combined with a simple querying interface that lets us quickly ask and answer questions in this scenario. These groups of components can then be arranged along a timeline to help build the underlying narrative and explore different hypotheses. We used Google Wave to share and discuss these timelines and hypotheses, and a translation API to decipher the messages in foreign languages.

The tool and source can be downloaded from http://github.com/RickWalker/epSpread

 Video:

Bangor-epSpread-MC1-video


ANSWERS:

MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

By tracing the initial incidence of known outbreak symptoms, we can pinpoint the ground zero to a truck on fire and spilling cargo on the 610 highway bridge, at approx 11:00am on May 17.

There are two initial vectors for the spread, producing two distinct set of symptoms - the wind carries it eastwards across the populous Downtown/Uptown zones (Figure 1), and the river carries it south-west through Plainville and Smogtown (Figure 2).

Figure 1 : From 17th May @ 11:00am ----> 18th May @ 6:00pm (truck on fire).  

Figure 2 : From 17th May @ 11:00am ----> 21st May @ 00:00am. Spread of contagion via river (cargo spill).



MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

Answer:

To present a hypothesis on infection transmission, we must first show how to identify the outbreak, then examine patterns in its spread. Figure 3, below, shows a streamgraph of numbers of tweets per day in the main view, and a graph of tweets per hour for that day in the secondary view. In the streamgraph, tweet numbers are normalised by the total for the day, so that we can see when events occur in different districts. A number of local events are visible, but clearly some significant, non-localised event (the outbreak) occurs from 17th May onwards.

Figure 3: Streamgraph visualisation of number of tweets by city zone, normalised by tweets-per-day.

Using messages sent in the first three days as a reference corpus, we used a relative entropy-based probabilistic measure for ranking word unigrams that have a probability that differs from the probability of the same unigram in the reference corpus (estimating the probability naively using the frequency count for the unigram divided by the number of unigram tokens). From 18th-20th May, unigrams such as ‘chills’, ‘diarrhea’, ‘sweats’ and ‘fever’ were ranked the highest using this measure, as shown in Figure 4.

Figure 4: Word cloud of unigrams that were ranked highest between 18th and 20th May using our relative entropy-based probabilistic measure.

Focusing on this period, we examined message content and language in more detail. The message structure made this necessary - the word ‘fever’ in a message, for example, was often used in reference to someone else having fever. We used regular expressions to differentiate between these cases, and isolated messages from those users reporting in the first-person the symptoms of the infection provided in the question description. This allowed us to form some hypotheses about transmission. We contend that the infection has two separate vectors for transmission - it is both airborne and waterborne - and present here observed trends in support of these two cases.

With Ground Zero as described in MC 1.1, we can see that two distinct sets of symptoms are present in the outbreak: gastrointestinal ailments track the river from the 601 bridge towards the south-west, while fever (chills, sweats), headache and cough follow a cone to the west of the same bridge.

We are told that drinking water for districts is sourced from nearby lakes and rivers. Combined with the truck accident on the bridge, this provides a possible path for an agent into the water supply. In fact, we see a pattern of symptoms developing: as shown in the timeline provided by our tool in Figure 5, tweets reporting nausea appear shortly before those reporting vomiting and abdominal pain, with practically no overlap. In contrast, diarrhea is reported over the entire period 19th-21st May.

Figure 5: clockwise from top left, nausea, vomit, abdominal pain and diarrhea. While the spread pattern is the same, the start and duration differ.

The wind direction at the time of the truck accident at ground zero is from the east (so towards the west). Looking at tweets that report fever, headache, cough, chills or sweats gives a clear cone shape in the hours following the accident. This shape becomes less clear as those initially exposed move from the Downtown and Uptown zones during working hours to the suburbs outside of those hours. By using our tool to select only those users known to be present in Downtown and Uptown in the hours following the accident, we can confirm this hypothesis: roughly 25% of them later report chills, fever or sweats. The intersection between the attendees of the technology convention that takes place the day after the accident and those reporting illness is around 0.1% - essentially, no one at the convention reports these symptoms. This implies that the airborne spread has been dispersed by Wednesday 19th, and that infection occurs solely on the 18th.

Figure 6: conference attendees do not seem to be infected - almost no one who was present at the conference reports any of the symptoms at any time.

We can also compute the intersection of these tweets with those reporting fever symptoms (chills, sweats, fever, headache, cough). Startlingly, there is none: not one user who reports the gastrointestinal symptoms also reports suffering from any of the fever/cough/headache symptoms at any point. There are clearly at least two methods for transmission, producing different sets of symptoms.

We also considered the possibility of further spread through person-to-person transmission, by looking at the intersection of users who tweeted reporting fever with those who tweeted reporting that someone else had fever. It is highly likely that at least some members of the latter group are in close physical contact with the former: as family, friends or carers. However, this intersection is practically null (0.2%). This strongly suggests that person-to-person transmission does not occur.

Figure 7: heatmap of symptoms over the period 19th-20th May. The outbreak is contained within the regions initially affected, with other minor hotspots at hospital locations.

Finally, to address the issue of outbreak containment, we need to take the population into account. We use the time of day of each tweet and the daytime/overnight population of the city zone from which it was sent to weight each message’s contribution to a heatmap. As shown in Figure 7, the outbreak is contained within two distinct areas: the Downtown and Uptown zones, and the region downstream of the bridge that sources its water from the river. Outside of the affected areas, it would seem prudent to deploy resources further downstream to tackle the waterborne spread. The airborne agent has been dispersed by 20th May and this is confirmed by the streamgraph: chills, fever, headache, cough and sweats are all past peak occurrence by this date, but the number of tweets reporting some of the waterborne symptoms, particularly ‘vomit’ and ‘diarrhea’, is still rising.